Topical Word Embeddings
نویسندگان
چکیده
Most word embedding models typically represent each word using a single vector, which makes these models indiscriminative for ubiquitous homonymy and polysemy. In order to enhance discriminativeness, we employ latent topic models to assign topics for each word in the text corpus, and learn topical word embeddings (TWE) based on both words and their topics. In this way, contextual word embeddings can be flexibly obtained to measure contextual word similarity. We can also build document representations, which are more expressive than some widely-used document models such as latent topic models. In the experiments, we evaluate the TWE models on two tasks, contextual word similarity and text classification. The experimental results show that our models outperform typical word embedding models including the multi-prototype version on contextual word similarity, and also exceed latent topic models and other representative document models on text classification. The source code of this paper can be obtained from https://github.com/ largelymfs/topical_word_embeddings.
منابع مشابه
Dual Embeddings and Metrics for Relational Similarity
Abstract. In this work, we study the problem of relational similarity by combining different word embeddings learned from different types of contexts. The word2vec model with linear bag-ofwords contexts can capture more topical and less functional similarity, while the dependency-based word embeddings with syntactic contexts can capture more functional and less topical similarity. We explore to...
متن کاملImproving Distributed Word Representation and Topic Model by Word-Topic Mixture Model
We propose a Word-Topic Mixture(WTM) model to improve word representation and topic model simultaneously. Firstly, it introduces the initial external word embeddings into the Topical Word Embeddings(TWE) model based on Latent Dirichlet Allocation(LDA) model to learn word embeddings and topic vectors. Then the results learned from TWE are integrated in the LDA by defining the probability distrib...
متن کاملDependency-Based Word Embeddings
While continuous word embeddings are gaining popularity, current models are based solely on linear contexts. In this work, we generalize the skip-gram model with negative sampling introduced by Mikolov et al. to include arbitrary contexts. In particular, we perform experiments with dependency-based contexts, and show that they produce markedly different embeddings. The dependencybased embedding...
متن کاملSentence Similarity Measures for Fine-Grained Estimation of Topical Relevance in Learner Essays
We investigate the task of assessing sentencelevel prompt relevance in learner essays. Various systems using word overlap, neural embeddings and neural compositional models are evaluated on two datasets of learner writing. We propose a new method for sentencelevel similarity calculation, which learns to adjust the weights of pre-trained word embeddings for a specific task, achieving substantial...
متن کاملSub-Word Similarity based Search for Embeddings: Inducing Rare-Word Embeddings for Word Similarity Tasks and Language Modelling
Training good word embeddings requires large amounts of data. Out-of-vocabulary words will still be encountered at test-time, leaving these words without embeddings. To overcome this lack of embeddings for rare words, existing methods leverage morphological features to generate embeddings. While the existing methods use computationally-intensive rule-based (Soricut and Och, 2015) or tool-based ...
متن کامل